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Abstract 

This report introduces a new hierarchical Bayesian model for the EEG source 
localization problem. This model promotes structured sparsity to search for fo¬ 
cal brain activity. This sparsity is obtained via a multivariate Bernoulli Laplacian 
prior assigned to the brain activity approximating an £20 pseudo norm regular¬ 
ization in a Bayesian framework. A partially collapsed Gibbs sampler is used to 
draw samples asymptotically distributed according to the posterior associated 
with the proposed Bayesian model. The generated samples are used to esti¬ 
mate the brain activity and the model hyperparameters jointly in an unsuper¬ 
vised framework. Two different kinds of Metropolis-Hastings moves are intro¬ 
duced to accelerate the convergence of the Gibbs sampler. The first move is 
based on multiple dipole shifts within each MGMG chain whereas the second 
one exploits proposals associated with different MGMG chains. We use both syn¬ 
thetic and real data to compare the performance of the proposed method with 
the weighted £21 mixed norm regularization and a method based on a multiple 
sparse prior, showing that our algorithm presents advantages in several scenar¬ 
ios. 


* An extended version of a paper submitted for publication: Bayesian Structured Sparsity Priors for EEG Source 
Localization. 
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1. Introduction 


The EEC source localization problem continues to attract a high level of coverage in the liter¬ 
ature resulting in a wide array of methods developed in the last years. These can be classified 
in two groups: (i) the dipole-fitting models that represent the brain activity as a small num¬ 
ber of dipoles with unknown positions: and (ii) the distributed-source models that represent 
it as a large number of dipoles in fixed positions. Dipole-fitting models (I1I21 try to estimate 
the amplitudes, orientations and positions of a few dipoles that explain the measured data. 
Unfortunately, these models usually provide solutions that vary with the initial guess of the 
number of dipoles and with their initial locations due to the presence of a large number of 
local minima in their cost function |3|. Several algorithms based on MUSIC were developed 
to solve this problem HHZl ■ In addition, sequential Monte Carlo methods were also investi¬ 
gated to estimate the dipole-fitting model parameters O. If the brain activity is composed 
of a small number of clustered sources, the dipole-fitting algorithms are capable of providing 
good results EUT^. However, their performance deteriorates for detecting multiple spatially 
extended sources (3). On the other hand, the distributed-source methods model the brain 
activity as the result of a large number of discrete dipoles with fixed positions and try to es¬ 
timate their amplitudes and orientations 0 . Since the amount of dipoles used in the brain 
model is typically much larger than the amount of electrodes, the inverse problem is ill-posed 
in the sense that there is an infinite amount of brain activities that can justify the measure¬ 
ments 0 . A regularization is thus needed in order to incorporate additional information to 
solve this inverse problem. One of the most simple regularizations consists of penalizing the 
£2 norm of the solution using the minimum norm estimation algorithm (TT) or its variants 
based on the weighted minimum norm: Loreta [12] and sLoreta (131 • However, these meth¬ 
ods have been shown to overestimate the size of the active area if the brain activity is focused 
0 , which is believed to be the case in a number of medical applications. A better way to esti¬ 
mate focal brain activity is to promote sparsity, by applying an £q pseudo norm regularization 
fTTI . Unfortunately, this procedure is known to be intractable in an optimization framework. 
As a consequence, the £q pseudo norm is usually approximated by the £\ norm via convex 
relaxation ca , in spite of the fact that these two approaches do not always provide the same 
solution (T4|. In a previous work, we proposed to combine them in a Bayesian framework 
flBl . using the £q penalty to locate the non-zero positions and the £\ norm to estimate their 
amplitudes. However, this £q h- £\ method, similarly to £q and £\ separately, considers each 
time sample independently leading in some cases to unrealistic solutions (T7| . 

To improve source localization, it is possible to make use of the temporal structure of the data 
by promoting structured sparsity, which is known to yield better results than standard spar¬ 
sity when applied to strongly group sparse signals [18] . Structured sparsity has been shown to 
improve results in several applications including audio restoration [IS] , image analysis (20] 
and machine learning |2T] . One way of applying structured sparsity in EEC source localiza¬ 
tion is to use mixed-norms regularization such as the /21 mixed norm [T7] (also referred to as 
group-lasso). This approach promotes sparsity among different dipoles (via the £\ portion of 
the norm) but groups all the time samples of the same dipole together, forcing them all to be 
either jointly active or inactive (with the £2 norm portion). However, it has several drawbacks 
including the manual tuning of the regularization parameter. 
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In addition to optimization techniques, several approaches have tried to model the time evo¬ 
lution of the dipole activity and estimate it using either Kalman filtering |23 |22l or particle 
filters (24H^ . Several Bayesian methods have been used as well, both in dipole-fitting mod¬ 
els |271[28] and distributed source models (231130]. In (201, Friston etal. developed the multi¬ 
ple sparse priors (MSP) approach, in which they parcellate the brain in different pre-defined 
regions and promote all the dipoles in each region to be active or inactive jointly. Doing this 
they encourage the brain activity to extend over an area instead of being focused in point¬ 
like sources. Conversely, we are mainly interested in estimating point-like focal source ac¬ 
tivity which has been proved to be relevant in clinical applications (JI]. In order to do this, 
we will consider each dipole separately instead of grouping them together. Note that this ap¬ 
proach avoids the need of choosing a criterion for brain parcellization as required in the MSP 
method. 

This report introduces a new hierarchical Bayesian model that estimates the brain activity 
with a structured sparsity constraint by using a multivariate Bernoulli Laplace prior (approx¬ 
imating the weighted £20 mixed norm). Since the parameters of the proposed model can¬ 
not be computed with closed-form expressions, we investigate Markov chain Monte Carlo 
sampling techniques to draw samples that are asymptotically distributed according to the 
posterior of the proposed model. We then estimate jointly the brain activity, the model pa¬ 
rameters and hyperparameters in an unsupervised framework. In order to avoid the sampler 
to get stuck around local maxima, specific Metropolis-Hastings moves are introduced, allow¬ 
ing new modes of the posterior to be explored. These moves are based on multiple dipole 
shifts (moving active dipoles to neighboring positions) and inter-chain proposals (exchang¬ 
ing samples between parallel MCMC chains) that significantly accelerate the convergence 
speed of the proposed sampler. These proposals generate candidates that are accepted or 
rejected using a Metropolis-Hastings criterion. The method is applied to both synthetic and 
real data showing promising results compared to the more traditional £21 mixed norm and 
the MSP method. 

The report is organized as follows: Section|^describes the considered problem. The proposed 
Bayesian model is presented in Sectionj^ Sectionj^introduces the partially collapsed Gibbs 
sampler used to generate samples distributed according to the posterior of this model. The 
Metropolis-Hastings moves that are used to accelerate the convergence of the sampler are 
introduced in Sectionj^ Experimental results conducted for both synthetic and real data are 
presented in Section]^ Conclusions are finally reported in Section Appendices]^ and 
include the algebraic derivations of the conditional distributions used in the Gibbs sampler 
and the acceptance rate of the Metropolis-Hastings moves respectively. 

2. Problem STATEMENT 

The EEG source localization is an inverse problem that consists in estimating the brain ac¬ 
tivity of a patient from EEG measurements taken from M electrodes during T time samples. 
In a distributed source model, the brain activity is represented by a finite number of dipoles 
located at fixed positions in the brain cortex. More precisely, we consider N dipoles located 
in the surface of the brain cortex and oriented orthogonally to it (motivations for this can be 
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found in (32]). The EEG measurement matrix Y e ^ can be written 

Y = HX + E (2.1) 

where X e contains the dipole amplitudes, H e represents the head operator 

(usually called “leadheld matrix”), and E is the measurement noise. Denote as rrii the i-th 
row of the matrix M and as its j -th column. 

Thus, the EEG source localization problem consists in estimating the matrix X from the 
known operator H and the measurements Y. 

The next section introduces the hierarchical Bayesian method proposed to solve this inverse 
problem. 


3. Bayesian MODEL 
3.1. Likelihood 

As is classical in the literature, we consider an additive white Gaussian noise with a constant 
variance over the considered time samples (3) . Note that when this assumption does not 
hold it is possible to estimate the noise covariance matrix from measurements that do not 
contain the signal of interest and use it to whiten the data (33). This leads to a gaussian like¬ 
lihood 


T 

f(Y\X,al) = f] Hx^g^Im] 

t=i ^ 


'where Im is the identity matrix of size M x M. 


(3.1) 


3.2. Prior DISTRIBUTIONS 
3.2.1. Prior OF THE BRAIN ACTIVITY X 

To promote structured sparsity of the source activity, we hrst consider the weighted £20 mixed 
pseudo-norm 


\\X\\20 = #{i■.^/ly\\xi\\2 7t0] (3.2) 

where = 11 h' 1 12 is a weight introduced to compensate the depth-weighting effect (51[T5] and 
#5^ denotes the cardinal of the set 5^. Since this prior leads to intractable computations, we 
propose to approximate it by a multivariate Laplace Bernoulli prior for each row of X 


f[Xi\zi,A) oc 


S[Xi) ifzi = 0 

exp|-}vTi7l|a;dl2) hzi = l 


(3.3) 


where A is the exponential distribution parameter and z e {0,1}^ is a vector indicating if the 

rows of X are non-zero. To make the analysis easier it is convenient to dehne the hyperpa- 
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rameter a= which transforms the prior to 
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if Zi = 0 
if Zi = 1 


(3.4) 


f{Xi\zi,a,an) oc 


5{Xi) 

exp( 


I I.T. II 

■2. I I ® I 112 


The elements zi are assigned a Bernoulli prior distribution with parameter oi e [0,1]: 


f{Zi\a)) = SS^Zi\(jD^ (3.5) 

Note that the Dirac delta function 5 (.) in the prior of Xi promotes sparsity while the Laplace 
distribution regulates the amplitudes of the non-zero rows. The parameter o) tunes the bal¬ 
ance between them. Indeed, w = 0 yields X = 0 whereas o) = I reduces the prior to the 
Bayesian formulation of the group-lasso |34l. Unfortunately the prior l |3.4t leads to an in¬ 
tractable posterior. It is possible to fix this problem by introducing a latent variable vector 
G (IR''')^ as proposed in (35). More precisely, we use the following gamma and Bernoulli- 
Gaussian priors on and Xi respectively 


/(rflfl) = 


r -t-1 Via\ 
2 


(3.6) 


f{Xi\Zi,T],o‘i) = 



0 (T^ 
yjyU nt' i 


if Zi = 0 

]h] = l 


(3.7) 


which can be shown to lead to the desired marginal distribution of Xi l |3.4t 1551 . 

In addition assuming the rows of z, and X apriori independent leads to the following 
priors: 


N 


/(T^ifl)=n/(T^ifl) 


N 


fiz\a)) = n 


N 


f{X\z,T^,al) = Y[fiXi\zi,T^^,al) 


3.2.2. Prior of the noise variance activity 
The noise variance is assigned a Jeffrey’s prior 

/(c7^)oc ^lR+(a^) (3.8) 

where 1 r+ (^) = 1 if (f e IR'*' and 0 otherwise. This choice is very classical when no information 
about a scale parameter is available (see |51] for details). 
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3.3. Hyperparameter PRIORS 


The proposed method allows one to balance the importance between sparsity of the solution 
and fidelity to the measurements using two hyperparameters: 1) to that adjusts the propor¬ 
tion of non-zero rows, and 2) a that controls the amplitudes of the non-zeros. The corre¬ 
sponding hierarchy of parameters and hyperparameters is shown in Fig. 3.1 In contrast to 
the £21 mixed norm our algorithm does not require to adjust these hyperparameters but is 
able to estimate their values from the data by assigning hyperpriors to them following a so- 
called hierarchical Bayesian analysis. This section defines the priors assigned to the model 
hyperparameters. 



Figure 3.1: Directed acyclic graph for the proposed Bayesian model. 


3.3.1. Hyperprior OF fl 


A conjugate gamma prior is assigned to a 


f{a\a,p) = ^\a a,(3^ 


(3.9) 


with a = p = 1. These values of a and )S yield a vague hyperprior for a. The conjugacy of 
this hyperprior will make the analysis easier in the sense that the conditional distribution of 
a required in the Gibbs sampler will also be a gamma distribution. 


3.3.2. Hyperprior OF fc) 

A uniform prior on [0, 1] is used for w 

f{a)) = ‘^[a) 0,1) 

reflecting the absence of knowledge for this hyperparameter. 


(3.10) 
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3.4. Posterior DISTRIBUTION 

Using the previously described priors and hyperpriors, the posterior distribution of the pro¬ 
posed Bayesian model is 

f[Y,a^„,X, z, a, w) oc f(Y\X,a \)/(X |t^, 2 , cr^) f{z\cu) /(t^| a) fia^) f{a]f{(o] (3.11) 

The following section investigates a partially collapsed Gibbs sampler that is used to sample 
according to the posterior distribution jSTl) and to build estimators of the unknown model 
parameters and hyperparameters using these generated samples. 

4. A Partially collapsed Gibbs Sampler 

The posterior distribution l |3.11| l is intractable and does not allow us to derive closed-form 
expressions for the estimators of the different parameters and hyperparameters of the pro¬ 
posed model. Thus we propose to draw samples from <3.1U and to use them to estimate the 
brain activity jointly with the model hyperparameters. More precisely, we investigate a par¬ 
tially collapsed Gibbs sampler that samples the variables Zi and xt jointly in order to exploit 
the strong correlation between these two variables. If we denote by X-i the matrix X whose 
ith row has been replaced by zeros, the resulting sampling strategy is summarized in Algo¬ 
rithm The corresponding conditional distributions are shovm hereafter and their exact 
derivation can be found in Appendrx|^ 

Algorithm 1 Partially Collapsed Gibbs sampler. 

Initialize X = 0 and z = 0 

Sample a and from their prior distributions 

repeat 

Sample cr^ from / | X, X, , z) 

Sample o) from /{ia|z) 
for i = 1 to Af do 

Sample t? from/{T^|a;,',a^,a,Z;) 

Sample z,- from f{Zi\Y,X-i,al,T^^,o)] 

Sample a;, from / (a; d X, X- , t?) 

end for 

Sample a from fia\T^) 
until convergence 


4.0.1. Conditional distribution of t? 

The conditional distribution of t? is a gamma distribution or a generalized inverse Gaussian 
distribution depending on the value of Zi. More precisely 


f[T^-\xi,al,a,Zi) = 




r+l Via 
2 ’ 2 




2 y Rz 

^ Ur, 


if Zi = 0 
ifzi = 1 . 


(4.1) 
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4.0.2. Conditional distribution of Xj 


The conditional distribution of the ith row of X is 




if Zi = 0 


Xi ifz; = l. 

V ^ / 


aUhViY-HX- 


iJ 2 _ ^ n*' i 

‘ 1 + 


4.0.3. Conditional distribution of Zi 
The conditional distribution of Zi is a Bernoulli distribution 

f{Zi\Y,X-i,al,T^i,a)) = Ss{zi 1 , ] 

' ICq + Ki' 


T 

r2^2 -2 


ko = l-(ji),ki = (ji)i —exp -^ . 

rr. ?(T. 


4.0.4. Conditional distribution of a 
The conditional distribution of a|T^ is the following gamma distribution 

2 ( N[T+1) LilViT^A ^ 

fia\T^) = ---+ a, ---+ )Sj. 


4.0.5. Conditional distribution of cr„ 

The distribution of cr^11^, X, 2 is the following inverse gamma distribution 


/(cr^|X,X,T^,2) = J'^\a 


2 |(M+||z|io)r 1 




i^h 


4.0.6. Conditional distribution of o) 


Finally, to|z has the following beta distribution 


f{oj\z) = S^e\a) 1 + ||z||o,l +Af- llzllo). 
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Figure 4.1: Example of posterior distribution of 2 with local ma xi ma 
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(a) Ground truth - Axial, coronal and sagittal views respectively 
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(b) Estimation without proposals - Axial, 
views respectively 

coronal and sagittal 
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(c) Estimation with proposals - Axial, coronal and sagittal views 
respectively 

Figure 4.2: Illustration of the effectiveness of the multiple dipole shift proposals. 


9 

















5. Convergence CONSIDERATIONS 


5.1. Local MAXIMA 

We have observed that the proposed partially collapsed Gibbs sampler may be stuck around 
local maxima of the variable 2 : from which it is very difficult to escape in a reasonable amount 
of iterations. Fig. |4.1| illustrates via a simple example that if the sampler gets to jz = {0,1} it 
requires to go through intermediary states with very low probability to move to the correct 
value z = {1,0}. This kind of situation can occur if the dipoles corresponding to Zi and zz 
produce similar measurements Y when active so that the probability of having either of them 
active is much higher than having them both on (or off) at the same time. 

5.2. Multiple dipole shiet proposals 

In order to solve this problem, we introduce a proposal making scheme, that consists in 
changing several elements of z simultaneously (which would allows us to go from {0,1} to 
{1,0} in one step in the previous example) after each sampling iteration. The proposal is 
accepted or rejected using a Metropolis-Hastings criteria which guarantees that the target 
distribution is preserved. 

We have observed that one of the most common ways for the algorithm to get stuck in local 
maxima is by failing to estimate the position of the non-zero elements of z. In other words, 
the sampling scheme detects a non-zero in a position that is not correct but is close (in the 
sense that it produces similar measurements) to the correct one, which causes the problem 
described above. 


5.2.1. Shift based proposals 

Before describing the proposed move, it is interesting to mention that it was inspired by an 
idea developed by Bourguignon et al in (37) to perform a spectral analysis of astrophysical 
data obtained after irregular sampling. The authors of (371 proposed to move a single non¬ 
zero element of a binary sequence to a random neighboring position after each iteration of 
the MCMC sampler. In the presence of a single non-zero element, this move is sufficient 
to escape from a local maximum of the posterior associated with our EEC source localiza¬ 
tion model. However, when there are several non-zero elements located at wrong locations, 
proposing to move each one of them separately may not be sufficient to escape from a local 
maximum. For this reason, we have generalized the scheme presented in (37| by proposing 
to move a random subset of K estimated non-zeros simultaneously to random neighboring 
positions. According to our experiments (some of them described in Section [^, the simple 
choice K = 2 provides good results in most practical cases. Since there is a high correlation 
between the variables and z, it is convenient to update their values jointly. The resulting 
multiple dipole shift proposal is detailed in Algorithm]^ Note that Bourguignon et al work 
with 1-dimensional data so they define the neighborhood of the element as {Zfc_i, z^+i}. In 
contrast, we are working with dipoles located in a 3-dimensional brain so the neighborhood 
definition is non-trivial and will be described in the following. 
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Algorithm 2 Multiple dipole shift proposal. 
z = z 

repeat K times 

Set indoid to be the index of a random non-zero of 2 
Set p = [indoid, neigh^ (indow)] 

Set indnew to be a random element of p 
Set ■Zindoid ~ ^ and Zlndnew ~ ^ 

end 

Sample X from/(X| z,X,c7^,t^). 

Sample f^ from fif^\X,a^,a,z). 

Set {z,T^} = {z,f^} with probability min| ij 

Resample X if the proposal was accepted 


In Fig. 4.2 we can see the effect of introducing multiple dipole shift proposals (with K = 2)m. 
a practical case. The first row of images is the ground truth (a single dipole activation) while 
the second row shows the probability of finding each dipole active with eight MCMC parallel 
chains without using proposals after 10.000 iterations. As we can see, the activity is in the 
correct area but the algorithm is not able to converge to the correct value of z. After intro¬ 
ducing the multiple dipole shift proposals the sampler converges in less than 1.000 iterations 
as shown in the third row of the figure. 


5.2.2. Acceptance PROBABILITY 

In order to guarantee that the generated samples are asymptotically distributed according 
to the posterior of the proposed model, all moves resulting from the multiple dipole shift 
proposal are accepted or rejected with the acceptance probability described in Algorithm]^ 
that uses the following probability distribution 


rCf j 

fiZr,Tl\Y,a,al,o)](x [I -0))^^°[a^r~ (5.1) 

n(rfrfexp(-^^)n»(4 

le/i ^ i=l 



where r = {i: Zi ^ z/}, Ik = U ■ = k], Q = #Ik for k = {0,1} and 


= _ 

^ 2 
(Ji 


+diag[^^ 






at 


f [H [H '^xL -y‘) ^ j ^ 

cri 


Note that m_s is obtained after removing in the vector m all the rows belonging to the set 
s, M~^ is the matrix M whose columns belonging to s have been removed, diag{s) is the 


11 














diagonal square matrix whose diagonal elements are the elements of s and \M\ is the deter¬ 
minant of the matrix M. The exact derivation is included in Appendix]^ 


5.2.3. Neighborhood DEFINITION 


It is obvious that the definition of the neighborhood used to exchange non-zero elements is 
crucial. Initially, we used a geometrical neighborhood, defined in terms of vertex connexity in 
the triangular tessellation modeling the brain cortex. However, this definition usually yields 
very small neighborhoods. This may cause the proposals not to be flexible enough to help 
the algorithm escape from local maxima. 

For this reason we propose a neighborhood definition that considers two dipoles to be neigh¬ 
bors if the correlation between their respective columns is higher than a certain threshold 


neigh^(i) = jjV i |corr(h',h-')] > yj 


(5.2) 


where corr(ui,U 2 ) is the correlation between vectors and V 2 and where the neighborhood 
size can be adjusted by setting 7 e [ 0 , 1 ] (y = 0 corresponds to a neighborhood containing all 
the dipoles and y = 1 corresponds to an empty neighborhood). 

An additional advantage of this definition is the fact that it allows the approach to be extended 
to other kind of inverse problems (different from EEC) where no geometrical disposition of 
the elements of z may be available. 

In order to maximize the moves efficiency, the value of y has to be selected carefully. A very 
large value of y will result in proposals not being flexible enough to help the algorithm in 
escaping local maxima. A very low value of y will result in a very large amount of possible 
proposals with many of them being useless leading to a large number of iterations to reach 
useful moves. Our experiments have shown that a good compromise is obtained with y = 0.8 
(see Section|^for illustrations). 


5.3. Inter-chain PROPOSALS 

The multiple dipole shift proposal scheme previously described allows the algorithm to bet¬ 
ter sample the value of z present in the posterior distribution and is able to find the active 
dipoles correctly if they are few in number. However, when running multiple MCMC chains 
in parallel with a higher amount of non-zeros present in the ground truth, it is possible for 
the different chains to get stuck in different values of z. In order to help them converge to the 
same (most probable) value, it is possible to exchange information between parallel chains to 
avoid local maxima during their runs as other approaches do, including Metropolis-coupled 
MCMC |38], Population MCMC (39) and simulated tempering (4011^ . 

In this report, we introduce inter-chain moves by proposing to exchange the values of z and 
between different chains. This move is accepted with the Metropolis-Hastings probability 
shown in Algorithm]^ 

One inconvenience introduced by inter-chain proposals is the fact that they require synchro¬ 
nizing the parallel MCMC chain processes, which decreases the iteration speed of the al¬ 
gorithm. In order to minimize this effect, an inter-chain proposal will be made after each 
iteration with probability p (adjusted to by cross validation) according to Algorithm]^ 
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Algorithm 3 Inter-chain proposals. 

Define a vector c = {1,2,L} where L is the number of chains 
for i = {1,2,...,L} 

Choose (and remove) a random element from c and denote it hy k 
Denote as the sampled values of {z,t^} of MCMC chain number 

For the chain #i set {z;, t?} = {zjc, t|} with probability 
Resample X if the proposal has been accepted 

end 
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(a) Ground truth - Axial, coronal and sagittal views respectively 
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(b) Estimation without proposals - Axial, 
views respectively 
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(c) Estimation with proposals - Axial, coronal and sagittal views 
respectively 

Figure 5.1: Illustration of inter-chain proposals effectiveness. 
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The benefit of introducing inter-chain proposals is illustrated in Fig. 5.1 The first row of 


images of this figure displays the five non-zeros pressent in the ground truth. Without using 
inter-chain proposals the different chains arrive to different modes after 100.000 iterations as 
illustrated in the second row (which displays the probability of finding each dipole active with 
8 parallel MCMC chains). The introduction of inter-chain proposals causes all the different 
chains to converge to the same (correct) value of 2 ; in less than 5.000 iterations as illustrated 
in the third row. 


5.4. Estimators 

The point estimators used in this study are defined as follows 

:(#.^(z)) 


z = argmaxj 

ze{0,l}~ 


.. A 

V = 


1 


Jim 


(m) 


(5.3) 

(5.4) 


where M (z) is the set of iteration numbers m for which the sampled variable z^'”^ = z after 
the burn-in period and p stands for any of the variables X., a, a\, o) and t^. Thus the estima¬ 
tor z <5.3| l corresponds to a maximum a posteriori estimator whereas the estimator used for 
all the other sampled variables l |5.4t is a minimum mean square error (MMSE) estimator. 
However, it is interesting to note that the proposed method provides the full distribution of 
the unknown parameters and is not limited to point-estimate as the methods based on the 
£21 mixed norm. 

It will be shown in the next section that in certain conditions, such as low SNR, the a-posteriori 
distribution of z has several values of comparable probability. These values of z are usually 
minor variations of each other (changing one of the dipoles to a neighboring position for in¬ 
stance). In this case, after convergence the algorithm oscillates between several values of z 
which allows the proposed method to identify several possible solutions (each of them cor¬ 
responding to a different value of z) with their corresponding probabilities. This is an advan¬ 
tage over the mixed £21 mixed norm method that is only able to provide a point-estimate of 
the solution. 


6 . Experimental RESULTS 
6.1. Synthetic DATA 

Synthetic data are first considered to compare the £21 mixed norm approach with the pro¬ 
posed method using a 212-dipole Stokthree-sphere head model HU with41 electrodes. Three 
kinds of activations are performed: (1) three dipoles with low SNR, (2) five dipoles with high 
SNR and (3) multiple dipoles with high SNR. 
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(a) Ground truth (b) ^21 -mixed norm estimation 

Figure 6.1: Estimated waveforms for three dipoles with SNR = 30dB. 
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(a) Ground truth - Axial, coronal and sagittal views respectively 
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(b) ^21 - Axial, coronal and sagittal views respectively 
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(c) Proposed method - Axial, coronal and sagittal views respec¬ 
tively 

Figure 6.2: Estimated activity for three dipoles and SNR = 30dB. 


15 



















(b) ^ 21 'mixed norm estimation 



Figure 6.3: Estimated waveforms for three dipoles with SNR = -3dB. 
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(a) Ground truth - Axial, coronal and sagittal views respectively 
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(b) ^21 - Axial, coronal and sagittal views respectively 
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(c) Proposed method - Axial, coronal and sagittal views respec¬ 
tively 

Figure 6.4: Estimated activity for three dipoles and SNR = -3dB. 
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Active non-zeros 

Percentage of samples 

1,2,3 

43% 

1,2,4 

22 % 

1,2,5 

11 % 

1 , 2,6 

7% 

1,2,7 

6 % 

Others 

11 % 


Table 6.1: Three dipoles with SNR = -3dB: modes explored after convergence. Positions 1, 2 
and 3 correspond to the non-zero elements of the ground truth. 




(a) Dipole waveform 1 (b) Dipole waveform 2 


(c) Dipole waveform 3 


Figure 6.5: Estimated boundaries fi±2a for the three dipole simulation with SNR = -3dB. 


6.1.1. Three-dipoles WITH LOW SNR 


Three dipoles were assigned excitations defined as synthetic damped sinusoidal waves with 
frequencies between 5 and 20Hz. These excitations were 500ms long (a period correspond¬ 
ing to a stationary dipole activity) and sampled at 200Hz. Different levels of noise were used 
to compare the performance of the proposed method with the weighted £21 mixed norm. 
The parameters of our multiple dipole shift proposal were set to K = 2 and 7 = 0.8 and eight 
MCMC chains were run in parallel. For the £21 mixed norm approach, the value of the regu¬ 
larization parameter A was chosen using cross-validation to get the best possible result. 

The results for SNR = 30dB are shown in Fig. |6.1| and Fig. |6.2[ Both algorithms seem to provide 
the same solution that is equal to the ground truth. The only minor difference is that the £21 - 
mixed norm regularization presents some dipoles (around ten) with very low but non-zero 
values, while our algorithm only detects the three non-zeros as real activity, this can be seen 


in the estimated dipoles locations in Fig. 6.2 


The estimated dipole locations with SNR = -3dB are shown in Fig. |6.4| whereas the corre¬ 


sponding estimated waveforms are shown in Fig. 6.3 Note that only the dipoles with highest 
activity are displayed for the £21 approach. The approach based on the £21 norm manages 
to recover only two of the three non-zero activities at the correct positions and seems to un¬ 
derestimate considerably the amplitude of the activity. This is a known problem caused by 
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600 



Figure 6.6: Three dipoles with SNR = -3dB: histograms of the hyperparameters. The actual 
values of and are marked with a red vertical line. 






Figure 6.7: Three dipoles with SNR = -3dB: PSRFs of sampled variables. 


approximating the (q pseudo-norm by the norm, since the later penalizes high amplitudes 
while the former penalizes all non-zero values equally. Our algorithm oscillates between sev¬ 
eral values of 2 : (specified in Table |6.l| . However, the most probable value of 2 : found by the 
algorithm is the correct one whereas the other ones have one of the active non-zeros moved 
to a close neighbor. 

The proposed method does not only allow a point-estimation of the activity but can also 
be used to estimate uncertainties associated with the activity. For instance. Fig. 6.5 shows 
the confidence intervals of the activity estimation (mean ± standard deviations). The actual 
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(a) Ground truth - Axial, coronal and sagittal views respectively 

• 

• 

• 

(b) (21 dipole locations - Axial, coronal and sagittal views re¬ 
spectively 

• 

• 

• 


(c) Proposed Method - Axial, coronal and sagittal views respec¬ 
tively 

Figure 6.8: Estimated activity for five dipoles and SNR = 30dB. 


ground truth activation is clearly located within two standard deviations of the estimated 
mean value obtained with the proposed algorithm. The histogram of the generated hyper¬ 
parameters 0 ), a and tr^ are shown in Fig. 


6.6 


They are clearly in good agreement with the 
actual values of the corresponding parameters. The PSRF’s are displayed in Fig. 6.7 It is pos¬ 
sible to see that the PSRF’s tend to 1 as the iterations increase, showing that the simulation is 
converging correctly. 


6.1.2. Five DIPOLES 

On our second kind of experiments, five dipoles were activated with the same damped sinu¬ 
soidal wave of 5Hz. The activations were sampled at 200Hz and scaled in amplitude so that 
each of them produced the same energy in the measurements. Noise was added to the mea¬ 
surements to obtain SNR = 30dB. For the £21 mixed norm regularization the regularization 
parameter was set according to the uncertainty principle which consists in finding a solution 
X such that (43) . Eight MCMC chains were run in parallel for the 

proposed method. Only the five non-zeros of the estimated activity with highest energy in 
the measurements were considered. 
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Figure 6.9: Estimated waveforms for five dipoles with SNR = 30dB. Green represents the 
ground truth, blue the £21 mixed norm estimation and red the proposed method. 
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(a) Recovery rate as a function of P 



(b) Proportion of residual energy as a function of P 


Figure 6.10: Performance measures for multiple dipoles. 


The results are displayed on Fig. 6.8 and Fig. 6.9 In the first one we are able to see that 


the proposed method is able to recover the five locations perfectly while the ^21 norm only 
detects four activations, two of which are not in the correct locations. In the waveforms dis¬ 
played in Fig. |6.9| we can see that, while both methods are able to recover the general activity 
pattern, the proposed method closely matches the waveform amplitude while the ^21 mixed 
norm does not. This is partially due to the fact that it detects some of the activations in the 
wrong positions (waveforms 4 and 5) and partially due to the tendency to underestimate the 
activation amplitudes inherent to the £0 to £\ convex relaxation. 


6.1.3. Multiple DIPOLES 

In this section, we compare the detection capabilities of the algorithm with respect to the 
^ 21 -mixed norm approach by varying the amount of non-zeros present in the ground truth. 
In each simulation of this section, P dipoles were activated with damped sinusoidal waves 
with frequencies varying between 5 and 20Hz. The activations were sampled at 200Hz and 
scaled in amplitude so that each of them produced the same energy in the measurements. 
Fifty different sets of localizations were used for the non-zero positions for each value ofP = 
1,...,20, resulting in a total of 1000 experiments. Noise was added to the measurements to 
obtain SNR = 30dB. For the ^21 mixed norm regularization the regularization parameter was 
set according to the uncertainty principle. 

For each simulation, the P non-zeros of the estimated activity associated with the highest 
energy in the measurements were considered as the estimated activity whereas the other el¬ 
ements were considered as residuals. We define the recovery rate as the proportion of non¬ 
zeros in the ground truth that are also present in the estimated activity. The average recovery 
rates of the proposed method and the ^21 mixed norm approach are presented in the first 


plot of Fig. 6.10 as a function of P. For P < 10 our algorithm detects the non-zeros with an 


accuracy higher than 90% which drops to 60.2% for P = 11 and 49.7% for P = 12. This drop 
of the recovery rate when a large number of non-zeros is present in the ground truth is well 
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(a) EEG whitened measurements 


(b) ^ 21 -niixed norm estimated wave-(c) Proposed method estimated wave¬ 
forms forms 


Figure 6.11: Measurements and estimated waveforms for the auditory evoked responses. 


known, since the possible amount of non-zeros to recover correctly is limited by the span of 
the operator (14]. For comparison, the ^21 mixed norm regularization recovers up to P = 5 
non-zeros with an accuracy higher than 90% and its recovery rate decreases slowly to reach 
64% for P = 10. Note that our method performs better than the ^21 approach for P < 11. How¬ 
ever, beyond this point, the performance of both methods is very poor preventing them from 
being used in practical applications. 

The recovery rate is calculated from the P main non-zero elements of the activity. However, it 
is also interesting to analyze how much activity is present in the residual non-zero elements. 
Thus, we define the proportion of residual energy as the amount of energy contained in the 
measurements generated by the residual non-zeros with respect to the total energy in the 
measurements. This residual energy serves as a measure of the sparsity of the solution. The 
second plot of Fig. [6.10 shows the value of the residual energy obtained for both algorithms 
as a function of P. The ^21 approach has up to 7.7% of the activity detected in residual non¬ 
zeros whereas our algorithm never exceeds 1.1% and always has lower residual activity than 
^ 21 . confirming its good sparsity properties. 


6.2. Real DATA EXPERIMENT 

Two real data sets are then considered to validate the proposed method. The first dataset 
corresponds to the auditory evoked responses to left ear pure tone stimulus while the second 
one consists of the evoked responses to facial stimulus. The results of the proposed method 
are compared with the weighted ^21 mixed norm (T7| and the multiple sparse priors method 

(29). 


6.2.1. Auditory EVOKED RESPONSES 

The default data set of the MNE software dH |45] is used in this section. It consists of the 
evoked response to left-ear auditory pure-tone stimulus using a realistic BEM (Boundary ele¬ 
ment method) head model sampled with 60 EEG electrodes and 306 MEG sensors. The head 
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(a) Weighted ^21 mixed norm - Uncertainty principle of parameter 



(b) Weighted ^21 mixed norm - Manual adjustment of parameter 



(c) Proposed method 



(d) MSP algorithm 

Figure 6.12: Estimated activity for the auditory evoked responses. 
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Figure6.13: Estimated waveforms mean and boundaries ii±2a for the auditory evoked re¬ 
sponses. 
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Figure 6.14: Hyperparameters histograms for the auditory evoked responses. 
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(b) PSRFoffl 
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(d) Maximum PSRF of X 
Figure 6.15: PSRFs of sampled variables for the auditory evoked responses . 


model contains 1.844 dipoles located on the cortex with orientations that are normal to the 
hrain surface. Two channels that had technical artifacts were ignored. The data was sam¬ 
pled at 600Hz. The samples were low-pass filtered at 40Hz and downsampled to 150Hz. The 
noise covariance matrix was estimated from 200ms of the data preceding each stimulus and 
was used to whiten the measurements. Fifty-one epochs were averaged to calculate the mea¬ 
surements Y. The activity of the source dipoles was estimated jointly for the period from 
0ms to 500ms after the stimulus. From a clinical perspective it is expected to find the brain 
activity primarily focused on tbe auditory cortices that are located close to the ears in both 
hemispheres of the brain. 

Since tbe measurements were whitened, it is possible to use the uncertainty principle to ad¬ 
just the hyperparameter of the (21 mixed norm. However, this provides an activity distributed 
all over the brain as shown in the first row of Fig. |6.12| By manually adjusting the hyperpa¬ 
rameter to produce a sparser result, the (21 mixed norm can obtain a solution that has activ¬ 
ity in the auditory cortices as expected, shown in the second row of images. In contrast, our 
algorithm estimates its hyperparameters automatically and finds most of the activity in the 
auditory cortices without requiring any manual adjustment as displayed in the third row. On 
the other hand, the MSP method spreads the activity around the auditory cortices area since 
it groups the dipoles together in pre-defined regions. 

The whitened measurements are displayed in Fig. |6.11| along with the activity estimation for 
both the (21 approach (after manually adjusting the hyperparameter) and our algorithm. 
The five waveforms estimated by the proposed method with their confidence intervals of 2c7 
are shown in Fig. 6.13 Both results present sharp peaks in the activations of the auditory cor¬ 
tex dipoles between 80 and 100 milliseconds after the application of the stimulus. Note that 
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(a) EEG whitened measurements (b)/ 21 'mixed norm estimated wave-(c) Proposed method estimated wave¬ 
forms forms 


Figure 6.16: Measurements and estimated waveforms for the facial evoked responses. 


the amplitudes estimated hy the proposed method are much higher than the ones obtained 
with the ^21 approach due to the aforementioned amplitude underestimation of the latter. 


The histograms of the hyperparameters of our algorithm are presented in Fig. 6.14 while their 


PSRF’s are shown in Fig. 6.15 In the PSRF’s we can see very abrupt changes in values for all 
the variables (most noticiably for X) in the same iterations. These correspond to the itera¬ 
tions in which proposals were accepted by different chains and reflect the fact that the PSRF’s 
can have very high values while the chains are in different modes. However, at the end of the 
simulation all chains converge to the same value of 2 : which causes all the PSRF’s to tend to 
1, showing the correct convergence of all the chains to the same posterior distribution. 


6.2.2. Facial evoked responses 

In a second experiment, we used data acquired from a face perception study where the sub¬ 
ject was required to evaluate the symmetry of a mixed set of faces and scrambled faces, 
one of the default datasets of the SPM softwar^ Faces were presented during 600ms ev¬ 
ery 3600ms. The measurements were taken by the electrodes of a 128-channel ActiveTwo 
system that sampled at 2048 Hz. The measurements were downsampled to 200Hz and, after 
artifact rejection, 299 epochs corresponding to the non-scrambled faces were averaged and 
low-pass filtered to 40Hz. A T1 MRl scan was then downsampled to generate a 3004 dipole 


head model. The estimated activities are shown in Fig. 6.17 As in the previous case, we can 


see that the ^21 mixed norm response (with the regularization parameter adjusted according 
to the uncertainty principle) estimates the activity spread around the brain. In contrast, ad¬ 
justing its regularization parameter manually results in a focal response concentrated in one 
of the fusiform regions in the temporal lobe associated with the facial recognition process 
(461 similar to the one obtained by our algorithm. The MSP algorithm spreads the activity 
over brain regions located more to the lateral and posterior parts of the brain, further away 
from the expected area. 


^The SPM software is freely avaiable at http://www.fil.ion.ucl.ac.uk/spm 
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(a) Weighted (21 mixed norm - Uncertainty principle for parameter 



(b) Weighted ^21 mixed norm - Manual adjustment of parameter 



(c) Proposed method 



(d) MSP algorithm 


Figure 6.17: Estimated activity for the facial evoked responses. 







Fig. 6.16 shows the EEG measurements and the estimated waveforms by the £21 approach 


and the proposed method. As with the auditory evoked responses data, they differ in the 
scale due to the underestimation of the activity amplitude by the £21 approach. 


6.3. Computational COST 

It is important to note that the price to pay with the proposed method, while having several 
advantages over the £21 mixed norm approach, is its higher computational complexity as 
is typical of MCMC methods when compared with optimization techniques. The low SNR 
three-dipole experiment was processed in 6 seconds running in a modern Xeon CPU E3-1240 
@ 3.4GHz processor (using a Matlab implementation with MEX hies written in C) against 
104ms for the £21 mixed norm approach. However, it is interesting to mention that the £21 
norm approach requires running the algorithm multiple times to adjust the regularization 
parameter by cross-validation. 


7. Conclusion 

We presented a Bayesian mathematical model for sparse EEG reconstruction that approx¬ 
imates the £20 mixed norm in a Bayesian framework by a multivariate Bernoulli Laplacian 
prior. A partially collapsed Gibbs sampler was used to sample from the target posterior dis¬ 
tribution. We introduced multiple dipole shift proposals within each MCMC chain and ex¬ 
change moves between different chains to improve the convergence speed. Using the gen¬ 
erated samples, the source activity was estimated jointly with the model hyperparameters in 
an unsupervised framework. The proposed method was compared with the £21 mixed norm 
and the multiple sparse prior methods for a wide variety of situations including several multi¬ 
dipole synthetic activations and two different sets of real data. Our algorithm presented sev¬ 
eral advantages including better recovery of dipole locations and waveforms in low SNR con¬ 
ditions, the capacity of correctly detecting a higher amount of non-zeros, providing sparser 
solutions and avoiding underestimation of the activation amplitude. Einally, the possibility of 
providing several solutions with their corresponding probabilities is interesting. Euture work 
will be devoted to a generalization of the proposed model to cases where the head model is 
not precisely known. 
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A. Derivation of the conditional probability distributions 


We will now proceed to show the derivation of the conditional prohability distributions of the 
associated model presented in section]^ of the report in detail. 

A.l. Posterior DISTRIBUTION 
As specified in the report, the associated posterior distribution is: 

fiY,al,X,z,a,T^,a)) = /{V|X,o-^)/(A:|T^,z,c7^)/(2:|tj)/(T^|a)/(c7^)/{a)/M (Al) 
From it we can derive the conditional distributions of all the associated parameters and hy¬ 


perparameters using Bayes’ theorem: 

fiZi,Xi\Y,X-i,al,j^^,a))(x f{Y\X,a\)f{Xi\T],Zi,a\)f{Zi\cu) {A.2) 

f{T]\Xi,o\,a,Zi)(x /(a;/|T?,z;,o-^)/(T^|fl) {A.3) 

/(fl|T^) CX {A.4) 

cx f{Y\X,a\)f{X\T'^,z,a\)f{a\) (A.5) 

/(w|z)oc/(z|w)/M {A.6) 


A.2. Conditional DISTRIBUTIONS 

Conditional distribution of t? 

The conditional distribution of t? is 

/(T^|®i,C7^,fl,Z,)oC /{®/|TpZ,,C7^)/{T^|a) {A.7) 

that is equal to 


f{Tj\xi,al,a,Zi] = 


jv[xi 0,aljpT]'S[Tj 


r+i 
2 ’ 


Vi a 
2 


r+1 Via ] 

2 ’ 2 j 


if Zi = 0 


if Zi = 1. 


(A.8) 


Based on the development of (35] it can be seen that the conditional distribution of t? is a 
generalized inverse gaussian when z, = 1 and a gamma distribution when z; = 0 


f{T^^\Xi,al,a,Zi) = 


<^(t2 


r+i 

2 

(-? 


Vi_a\ 

’ 2 j 

2 J ViClf 



if Zi = 0 


ifz; = 1. 


(A.9) 


Conditional distribution of z; and xi 

As stated in the report, we will sample Zi and Xi jointly from 


fiZi,Xi\Y,X-i,al,J^^,CD)(x f(Y\X,al)fiXi\Tj,Zi,al)fiZi\(x)) 


(A. 10) 


that is equal to 
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The marginal distribution of z, is of the form 
f[Zi\Y,X-i,al,T^^,a))= j 
with 


exp(- 

(A.11) 
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dxi. 


This implies that Zi has the following Bernoulli distribution 


f{zi\Y,X-i,al,Tj,ti)) = Ss\^Zi 


fci 


To find the value of ki we calculate the minus logarithm of the integrand 
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and express it as a sum for the different values of t 
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By denoting Dl = Y^ - expanding this expression we have 
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(A. 19) 
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Summing over all time samples and applying the function exp(-x) (to compensate the steps 
done in |A.16| and |A.18| l results in 
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We can calculate the final value of ki by combining l |A.14) and l |A.24| l 
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Using JA.11) and l |A.24t we obtain the conditional distribution of xi 

6{Xi) ifz, = 0 
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which corresponds to the following gamma distrihution 
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Conditional distrihution of a„ 

The conditional distrihution of is 
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we can express this hy 
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which corresponds to the following inverse gamma distrihution 
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where lo,i represents a function that is 1 for 0 < tj < 1 and 0 elsewhere. Using the identity 
<A.311 this can he shown to he equal to 
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which corresponds to the following Beta distrihution 

f[a)\z) = SSe[(jD 1 + ||z||o,l +Af- llzllo]. 


{A.38) 
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B. Derivation of proposal acceptance probability 


In section we presented two different kind of proposals used in the algorithm: multiple 
dipole shifts and inter-chain proposals. Both of them consists in proposing to change the 
values of the vectors z and jointly. In this appendbc we will calculate their acceptance 
prohahility. 

Using Bayes’ theorem it is easy to show that the acceptance prohahility is equal to the ratio 
of the conditional distribution f{T^,z\Y, a, evaluated in the current and proposed val¬ 
ues of the vectors. It is possible to improve the speed of the calculation by only considering 
the ith row if the current value of the element z, differs from the proposed value zP. Denot¬ 
ing this subset of rows by r we can calculate / , Xr 11^, X- r, z_ ^, t?. ^, a, cr^, a») and then 

integrate it with respect to Xr- 

Based on the posterior l |3.11h we can derive the conditional distribution of the subset r of 
rows as 

f[Tl,Zr,Xr\Y,X-r,Z-r,Ti^,a,al,a))(x /(V|X,p/{®Z,'|a^,T?,w) /(t? |fl) (B.l) 
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Denoting the subsets: I]c = {r,- : z^ = k} (Note that /q u Ji = r and Ion Ii = 0) and their 
cardinals Cfc = #Ik for k = {0,1} and using identity l |A.31| l allows us to express this as 
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We now integrate the above expression over Xr leading to 
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Evaluating xi = 0 for i e Jq because of the 5(a;;) converts this to 
/(t2,z,|.)oc 
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with 
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(B.5) 
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being the vector x^ that has all the rows in s eliminated and H-s the matrix H that has 
all the columns in s eliminated. 

I can also be expressed as 
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withC^ = + = ^^(\\H.rxL, +Hi.x^^-y^wKZiu, ^). 

Denoting = H-nx^ R~y^ this can be expressed as 
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If we denote Q = diagC^-) being diag{s) the diagonal square matrix formed by the elements 
of vector s 
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Matching each term between the previous expression and Z ^ j 
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We can now calculate the value of I 
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Finally using the value of / in l |B.5| l yields 
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